Introduction to Distributional Regression

Practical session 2

Gillian Heller

NHMRC Clinical Trials Centre, University of Sydney

gillian.heller@sydney.edu.au

Practical session 2: Diagnostics

Plasma data set

data(plasma)
plasma |>
  filter(alcohol<200) -> plasma

Lasso models

m.lasso <- gamlss(betadiet ~ fiber, data=plasma, family=BCTo, trace = FALSE)
m.lasso.adaptive2 <- gamlss(betadiet ~ log(fiber), data=plasma, family=BCTo, trace = FALSE)

Stepwise model

m.step <- gamlss(betadiet ~ pb(fiber) + pb(cholesterol) + fat + factor(smokstat),
                 sigma.formula =~ cholesterol,
                 nu.formula =~ 1, 
                 tau.formula =~ 1,
                 family = BCTo, data=plasma, trace = FALSE)

Model selection criteria

AIC(m.step, m.lasso, m.lasso.adaptive2)
                        df      AIC
m.step            18.08183 5217.076
m.lasso.adaptive2  5.00000 5228.318
m.lasso            5.00000 5242.114


BIC(m.step, m.lasso, m.lasso.adaptive2)
                        df      BIC
m.step            18.08183 5284.872
m.lasso            5.00000 5260.861
m.lasso.adaptive2  5.00000 5247.065

Diagnostic plots

plot(m.step)

******************************************************************
          Summary of the Quantile Residuals
                           mean   =  -0.001549316 
                       variance   =  1.003283 
               coef. of skewness  =  0.005100953 
               coef. of kurtosis  =  2.990256 
Filliben correlation coefficient  =  0.9976408 
******************************************************************

Diagnostic plots

plot(m.lasso)

******************************************************************
          Summary of the Quantile Residuals
                           mean   =  -0.0004199963 
                       variance   =  1.003924 
               coef. of skewness  =  0.007600763 
               coef. of kurtosis  =  2.988339 
Filliben correlation coefficient  =  0.9980725 
******************************************************************

Diagnostic plots

plot(m.lasso.adaptive2)

******************************************************************
          Summary of the Quantile Residuals
                           mean   =  -0.0009406445 
                       variance   =  1.003896 
               coef. of skewness  =  0.01505683 
               coef. of kurtosis  =  3.005563 
Filliben correlation coefficient  =  0.9976675 
******************************************************************

Worm plots

par(mfrow=c(2,2))
wp(m.step)
title("m.step")
wp(m.lasso)
title("m.lasso")
wp(m.lasso.adaptive2)
title("m.lasso.adaptive2")

Combined worm plots

model_wp(m.step, m.lasso, m.lasso.adaptive2)

Worm plot by covariate

wp(m.step, xvar=fiber, n.inter=6)
number of missing points from plot= 0  out of  52 
number of missing points from plot= 0  out of  54 
number of missing points from plot= 0  out of  52 
number of missing points from plot= 0  out of  51 
number of missing points from plot= 0  out of  53 
number of missing points from plot= 0  out of  52 

Term plots

fitted_terms(m.step, parameter = "mu")

Term plots

fitted_terms(m.step, parameter = "sigma")

Parameter interpretation

m.lasso.adaptive2

summary(m.lasso.adaptive2)
******************************************************************
Family:  c("BCTo", "Box-Cox-t-orig.") 

Call:  gamlss(formula = betadiet ~ log(fiber), family = BCTo,  
    data = plasma, trace = FALSE) 

Fitting method: RS() 

------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.37324    0.18403   29.20   <2e-16 ***
log(fiber)   0.86470    0.07384   11.71   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
Sigma link function:  log
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.66598    0.06218  -10.71   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
Nu link function:  identity 
Nu Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.16413    0.09424   1.742   0.0826 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
Tau link function:  log 
Tau Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.6357     0.6472   4.072 5.92e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
No. of observations in the fit:  314 
Degrees of Freedom for the fit:  5
      Residual Deg. of Freedom:  309 
                      at cycle:  7 
 
Global Deviance:     5218.318 
            AIC:     5228.318 
            SBC:     5247.065 
******************************************************************

Parameter interpretation: m.lasso.adaptive2

  • The effect of an increase of 1 in log(fiber) is to multiply median betadiet by \exp(0.86470) = 2.37

OR (better)

  • The effect of a 10% increase in fiber is a multiplicative increase in median betadiet of 1.1^{0.86470}=1.086, i.e. 8.6% increase.
  • Parameter interpretation for this model is straightforward because there are covariates (actually just one covariate) in only one model parameter (\mu).

Parameter interpretation: m.lasso.adaptive2

pe_pdf(m.lasso.adaptive2,  term="fiber", x.values = c(5, 10, 15, 20, 25, 30))

Parameter interpretation: m.step

  • What is the effect of cholesterol on the distribution of betadiet?

  • We cannot evaluate the effect of cholesterol on \mu in isolation, because cholesterol simultaneously affects \sigma

  • We need to look at the effect of cholesterol on the whole distribution of betadiet

Parameter interpretation: pe_pdf

  • Plots of the fitted pdf of betadiet for
    • smokstat = 1 (never smoked)
    • fat = 54 (~ 1st quartile)
    • fiber = 15 (~ 3rd quartile)
    • cholesterol = 100, 200, 300, 400, 500
pe_pdf(m.step,  term="cholesterol",
       scenario = list("smokstat"=1, "fat"=54, "fiber"=15),
       x.values = c(100, 200, 300, 400, 500),
       xlim = c(0, 7500))

Parameter interpretation: pe_pdf

  • Plots of the fitted pdf of betadiet for
    • smokstat = 3 (current smoker)
    • fat = 95 (~ 3rd quartile)
    • fiber = 9 (~ 1st quartile)
    • cholesterol = 100, 200, 300, 400, 500
pe_pdf(m.step,  term="cholesterol",
       scenario = list("smokstat"=3, "fat"=95, "fiber"=9),
       x.values = c(100, 200, 300, 400, 500),
       xlim = c(0, 7500))